Development of Telugu-Tamil Transfer-Based Machine Translation system: With Special reference to Divergence Index
نویسنده
چکیده
The existence of translation divergence precludes straightforward mapping in machine translation (MT) system. An increase in the number of divergences also increases the complexity, especially in linguistically motivated transfer-based MT systems. In other words, divergence is directly proportional to the complexity of MT. Here we propose a divergence index (DI) to quantify the number of parametric variations between languages, which helps in improving the success rate of MT. This paper deals with how to build divergence index for a given language pair by giving examples between Telugu and Tamil, the major Dravidian languages spoken in South India. It also proposes handling strategies to overcome these divergences. The presentation of the paper also includes a live demo of Telugu-Tamil MT.
منابع مشابه
A Study on Divergence in Malayalam and Tamil Language in Machine Translation Perceptive
Machine Translation has made significant achievements for the past decades. However, in many languages, the complexity with its rich inflection and agglutination poses many challenges, that forced for manual translation to make the corpus available. The divergence in lexical, syntactic and semantic in any pair of languages makes machine translation more difficult. And many systems still depend ...
متن کاملPanchBhoota: Hierarchical Phrase Based Machine Translation Systems for Five Indian Languages
We present our work on developing fifteen Hierarchical Phrase Based Statistical Machine Translation (HPBSMT) systems for five Indian language pairs namely Bengali-Hindi, EnglishHindi, Marathi-Hindi, Tamil-Hindi, and Telugu-Hindi, in three domains each, HEALTH, TOURISM and GENERAL. We named them PanchBhoota, as these systems are elemental in nature. We used a very simple approach to train, tune,...
متن کاملStatistical Machine Translation for Indian Languages: Mission Hindi 2
This paper presents Centre for Development of Advanced Computing Mumbai’s (CDACM) submission to NLP Tools Contest on Statistical Machine Translation in Indian Languages (ILSMT) 2015 (collocated with ICON 2015). The aim of the contest was to collectively explore the effectiveness of Statistical Machine Translation (SMT) while translating within Indian languages and between English and Indian lan...
متن کاملRule Based Case Transfer in Tamil-Malayalam Machine Translation
The paper focuses on the rule based case transfer, which is a part of the transfer grammar module developed for bidirectional Tamil to Malayalam Machine Translation system. The present study involves two typologically close and genetically related languages, namely Tamil and Malayalam. We considered the basic construction of sentences which is highly dependent on the case systems. The rules wer...
متن کاملCombining Bilingual and Comparable Corpora for Low Resource Machine Translation
Statistical machine translation (SMT) performance suffers when models are trained on only small amounts of parallel data. The learned models typically have both low accuracy (incorrect translations and feature scores) and low coverage (high out-of-vocabulary rates). In this work, we use an additional data resource, comparable corpora, to improve both. Beginning with a small bitext and correspon...
متن کامل